Multi-camera homogeneous object alignment

ABSTRACT

A first digital representation in an image and a second digital representation in a second image is identified based at least in part on specified criteria. A first epipolar line in the second image is determined based at least in part on a position of the first digital representation in the image. A second epipolar line is determined based at least in part on a position of the second digital representation in the second image. At least one cost value is determined based at least in part on the first digital representation, the second digital representation, the first epipolar line, and the second epipolar line. The first digital representation and the second digital representation are determined, based at least in part on the at least one cost value, to represent a same object. The first digital representation is associated in a data store with the second digital representation.

CROSS REFERENCE TO RELATED APPLICATION

This application incorporates by reference for all purposes the fulldisclosure of co-pending U.S. patent application Ser. No. 15/979,210,filed concurrently herewith, entitled “MULTI-CAMERA HOMOGENEOUS OBJECTTRAJECTORY ALIGNMENT”.

BACKGROUND

In an image frame of a video recording, it can be difficult for acomputing device to distinguish one object in the image frame fromanother object in the image frame, particularly when the objects arerelatively homogenous in size and shape. It is even more challenging forcomputing devices to identify which of the homogenous objects in thescene of a video recorded by one video camera correspond to homogenousobjects in another video simultaneously recorded by another video camerafrom a different perspective. The difficulty is exacerbated if theobjects are animate and changing in position and orientation betweenimage frames of the videos. Tracking the separate trajectories ofhomogenous objects in the videos presents another challenge, as thehomogeneity of the objects makes it difficult for a computing device todetermine which object is associated with which trajectory (particularlyif the objects cross paths, are near to each other, or enter or exitfrom outside the field of view of the video cameras).

BRIEF DESCRIPTION OF THE DRAWINGS

Various techniques will be described with reference to the drawings, inwhich:

FIG. 1 illustrates an example of multi-camera image capture of a sharedscene in accordance with an embodiment;

FIG. 2 illustrates another example of multi-camera image capture of ashared scene in accordance with an embodiment;

FIG. 3 illustrates an example of an epipolar constraint in accordancewith an embodiment;

FIG. 4 illustrates an example of transforming an image in accordancewith an embodiment;

FIG. 5 illustrates an example of determining a region of interest of ahomography constraint in accordance with an embodiment;

FIG. 6 is a flowchart that illustrates an example of computing epipolarconstraint matrices in accordance with an embodiment;

FIG. 7 is a flowchart that illustrates an example of determining ahomography constraint in accordance with an embodiment;

FIG. 8 illustrates an example of object tracking in accordance with anembodiment;

FIG. 9 is a flowchart that illustrates an example of object tracking inaccordance with an embodiment;

FIG. 10 illustrates an example of matching trajectories in accordancewith an embodiment;

FIG. 11 illustrates an example of detecting misattributed trajectoriesin accordance with an embodiment;

FIG. 12 is a flowchart that illustrates an example of validatingtrajectories in accordance with an embodiment; and

FIG. 13 illustrates an environment in which various embodiments can beimplemented.

DETAILED DESCRIPTION

Techniques and systems described below relate to multi-camera objectalignment and object trajectory alignment. In one example, a first imageand a second image is obtained via a plurality of image capture devicessharing different perspective views of a common scene. A first pointcorresponding to an object identified in the first image is determined.A line in the second image corresponding to the first point in the firstimage is determined based at least in part on the first point andrelative positions of the plurality of image capture devices. A secondpoint corresponding to a first set of pixels and a third pointcorresponding to a second set of pixels is determined in the secondimage. A first distance from the second point to the line and a seconddistance from the third point to the line is determined. A set of costvalues is calculated based at least in part on the first distance andthe second distance. Based at least in part on the set of cost values,that the first set of pixels is determined to represent, in the secondimage, the object identified in the first image, and the first set ofpixels is associated with the object.

In another example, a sequence of images recorded by an image capturedevice is obtained, with the sequence including a first image and asecond image and with the first image including a first representationof an object. A position of the first representation in the first imageis determined. A predicted position for a representation of the objectin the second image is generated at least in part by providing theposition as input to a prediction algorithm. The second representationis identified, based at least in part on a distance between thepredicted position and a second representation, as representing theobject in the second image. A trajectory between the position of thefirst representation and a position of the second representation isgenerated. A request for a state of the object at a moment in timecaptured by the image capture device is received. Based at least in parton the trajectory and the moment in time, the state of the object at themoment is determined, and the state of the object is provided inresponse to the request.

In the preceding and following description, various techniques aredescribed. For purposes of explanation, specific configurations anddetails are set forth in order to provide a thorough understanding ofpossible ways of implementing the techniques. However, it will also beapparent that the techniques described below may be practiced indifferent configurations without the specific details. Furthermore,well-known features may be omitted or simplified to avoid obscuring thetechniques being described.

Techniques described and suggested in the present disclosure improve thefield of computing, especially the field of digital object kinematics,by tracking the motion of objects in a three-dimensional space depictedin a sequence of two-dimensional images. Additionally, techniquesdescribed and suggested in the present disclosure improve the efficiencyof digital object identification and matching by using an algorithm thatefficiently estimates a likely match between objects in two imagescaptured from different perspectives. Moreover, techniques described andsuggested in the present disclosure are necessarily rooted in computertechnology in order to overcome problems specifically arising with theproblem of matching and distinguishing homogenous objects in digitalimages.

FIG. 1 illustrates an example embodiment 100 of the present disclosure.Specifically, FIG. 1 depicts a physical environment 102 containing aplurality of objects 106 being visually recorded by a series ofrecording devices 112, each having a field of view 104 of the physicalenvironment 102.

When two cameras, such as the recording devices 112A-12B, view a scenein the physical environment 102 from two distinct positions, epipolargeometry may be utilized to match an object in the image captured by thefirst recording device 112A to its corresponding object in the imagecaptured by the second recording device 112B. In various embodiments,the term “match” does not necessarily indicate equality. That is, twoobjects in separate images may be said to match if they correspond to acommon object or satisfy one or more matching criteria. Likewise, twotrajectories of objects in separate sequences of images may match ifthey correspond to the same object or satisfy one or more matchingcriteria. Generally any way of determining a match may be utilized.

In some embodiments, in an initial stage, points of reference in thebackground are determined. For example, the physical environment 102depicts a sports field with various parallel lines. As can be seen,parallel lines 110A-10C pass through both fields of view of therecording devices 112A-12B. The real-world coordinates for the parallellines 110A-10C may be determined, and, likewise, the picture element(pixel) coordinates corresponding to the parallel lines 110A-10C in thecaptured images may be determined.

The parallel lines 110A-10C may be used to determine epipolar linesbetween an image captured by the first recording device 112A and thesecond recording device 112B. A pair of cost matrices (one for each ofthe pair of images captured by the first recording device 112A and thesecond recording device 112B) comprising distances (e.g., in pixels) ofdetected objects to the epipolar lines.

The physical environment 102 may be a real (e.g., non-virtual) location,at least a portion of which is being recorded as a sequence of images byone or more image capture devices (e.g., the recording devices 112). Forexample, FIG. 1 depicts a sports field as one example of the physicalenvironment 102. However, FIG. 1 is intended to be illustrative only,and it is contemplated that techniques of the present disclosure may beused in other types of physical environments, such as in areas undersurveillance by security cameras, roads and/or other areas beingrecorded by image sensors on an automobile, etc.

The fields of view 104A-04B may be the extent of the physicalenvironment 102 that is captured by the respective recording devices112A-12B. The fields of view 104A-04B may be solid angles (e.g.,two-dimensional angles in three-dimensional space that an objectsubtends at a point) through which elements (e.g., pixel sensors) of therecording devices 112A-12B are sensitive to electromagnetic radiation atany one time.

The objects 106 may be a plurality of objects that are within the fieldsof view 104A-04B, at least a subset of which are captured (i.e., withina shared view) by both recording devices 112A-12B. In some embodiments,the objects 106 are individuals, such as members of sports teams in thephysical environment 102. However, it is also contemplated thattechniques of the present disclosure are applicable with objects 106that are either animate or inanimate, and/or include one or more of aliving (e.g., animal, plant, etc.) or non-living entity (e.g., boulder,automobile, building, etc.).

In some implementations, the objects 106 have certain visualcharacteristics (e.g., shape, color, pattern, etc.) usable by systems ofthe present disclosure to distinguish the objects 106 from thebackground of the physical environment 102 and/or from objects that arenot of interest in the particular application of these techniques. Forexample, in one implementation the system identifies occurrences of“football helmets” in images captured by the recording devices 112A-12Bas being the objects 106 of interest (football helmets havingcharacteristics of being of particular shape and/or color); in thismanner, background objects (e.g., the football, goal posts, hash marks,referees, spectators, etc.) incidentally captured in images captured bythe recording devices 112A-12B may be excluded from the objects 106 ofinterest identified by the system.

In some applications, the objects are generally homogeneous. In someexamples, the term “homogenous” refers to uniformity (e.g., size, color,or shape) within an image such that one object of the objects is notconsistently visually identifiable from another of the objects. Withinan image, the object may be represented by a set of pixels, with thesize of the set of pixels being affected both by the distance of theimage capture device from the object as well as the resolution of theimage. For example, during an American football game players of aparticular team wear helmets of the same size, shape, and colorcombination, and a set of pixels representing the helmet of one playermay not include sufficient distinguishing characteristics to distinguishit from a set of pixels representing the helmet of another player. Anobject may be considered homogenous even if it includes certaindistinguishing visible characteristics if such the object is notdistinguishable (and therefore be considered homogenous) from other ofthe objects due to the positions and orientations of the objects. Forexample, players of a particular sports team may wear uniform with thesame colors and/or patterns as other players, but may have numbersand/or names printed on the uniform for identification purposes;however, in any given image, an identifying mark may be obscured orturned away from the image capture devices such that the identity of theobject (e.g., player) is uncertain.

The recording devices 112A-12B may be may be devices for electronicmotion picture acquisition or electronic still picture acquisition. Inembodiments, the recording devices 112A-12B include an image sensor(e.g., charge-coupled device (CCD) or complementarymetal-oxide-semiconductor (CMOS)), memory, image processing capability,and/or a microphone. The recording devices 112A-12B may be designed torecord and/or communicate a digital or analog stream of media (e.g.,audio, video, text, or any combination of audio, video, or text) in amanner such that the media can be replayed or reproduced on a devicedesigned to display such media. Examples of such recording devicesinclude a digital video camera, a web camera, mobile telephone, and soon. In embodiments, the video capture device 112A-12B is stationary.However, it is contemplated that certain techniques of the presentdisclosure may be applied to non-stationary recording devices. Forexample, a non-stationary recording device may follow an object inmotion (e.g., keeping the object within its field of view).

FIG. 2 illustrates an aspect of an environment 200 in which anembodiment may be practiced. As illustrated in FIG. 2, the environment200 may include a first image capture device 212A that captures a firstimage 204A simultaneous with a second image capture device 212B thatcaptures a second image 204B, where both images 204A-04B include digitalrepresentations 216A-16B of an object 206 in the scene common to bothimage capture devices 212A-12B. In the first image 204A, a point 210 onthe digital representation 216A is seen to correspond to a point (e.g.,the center of the helmet of the object 206) on the object 206. The point210A in the first image can be seen to correspond to an epipolar line210B in the second image 204B.

The images 204A-04B may be two-dimensional digital images captured byrespective image capture devices 212A-12B, such as the image capturedevices 112A-12B of FIG. 1. The fields of view of the cameras 212A-12Bmay overlap (e.g., such as can be seen in FIG. 1), resulting in each ofthe images 204A-04B having digital captures of objects common to bothfields of view, such as the digital representations 216A-16B of theobject 206. Each of the images 304A-04B may be a numeric (e.g., binary)representation of a two-dimensional image that comprise a set of pictureelements (pixels). Each of the images 304A-04B may contain a fixednumber of rows and columns of pixels holding values that represent thebrightness of a given color at that specific point. The images 304A-04Bmay be formatted according to a specific image format, such as GraphicsInterchange Format (GIF), Joint Photographic Experts Group (JPEG),Portable Network Graphics (PNG), bitmap (BMP), or Tagged Image FileFormat (TIFF).

The object 206 may be an object similar to one of the objects 106described in conjunction with FIG. 1. The image capture devices 212A-12Bmay be similar to the image capture devices 112A-12B described inconjunction with FIG. 1. The digital representations 216A-16B may besets of pixels in the respective images 204A-04B that represent theobject 206.

The epipoles 218A-18B may be points where a line from the focal centerof the first image capture device 212A to the focal center of the secondimage capture device 212B intersects in their respective images204A-04B. That is the epipole 218A is the point of intersection in theimage 204A of the line from the first image capture device 212A, and theepipole 218B is the point of intersection in the image 204B of the sameline. The line that intersects the epipoles 218A-18B may be determinedfrom information about the physical proximity and orientation of each ofthe cameras. The relationship between the positions of the cameras maybe determined initially using static correspondence points in thebackground. For example, in an American football field, the yard linesare known to lie on a flat surface, be 10 yards apart, and parallel toeach other. Consequently, by detecting a pair of yard lines in an image,the pixel coordinates of the yard lines may be used to determine theangle, position, and distance of the camera from the field. Bydetermining the angle, position, and distance of a pair of cameras fromthe field, the cameras' positions and orientations of their respectivefocal points can likewise be calculated.

Additionally or alternatively, the line that intersects the epipoles218A-18B may be derived based on differences in perspectives of staticfeatures (e.g., yardage lines on a field, base of a goal post, or otherstationary object or background feature) common in the shared scene ofthe images 204A-04B. That is, given a determined point-to-pointcorrespondence of sets of points in any two images of the same scene,the relationship between the two images may be expressed as afundamental matrix, which is a 3×3 matrix that constrains where theprojection of points from the scene can occur in both images. The setsof points used to generate the fundamental matrix may be determinedusing static features (e.g., yardage lines, stationary objects, or someother static background features) in each of the images. Given theprojection of a scene point into one of the images, the correspondingpoint in the other image is constrained to a line. In some examples, an“epipolar constraint” refers to the relation between corresponding imagepoints that the fundamental matrix represents.

The reference point 210A may be a point on or proximate to the digitalrepresentation 216A of the object 206. The reference point 210A may beused to indicate or represent the position of the digital representation216A in the image 204A. That is, the reference point 210A may refer to apoint on or proximate to the object 206 in the real world. In theillustrative example of FIG. 2, the reference point 210A corresponds toa point in the center of the helmet of the object 206. As can be seen,the point in the center of the helmet of the object 206 and the focalcenters of the image capture devices 212A-12B form an image plane. Theintersection of the image plane and the image 204B, therefore yields theepipolar line 210B, which, under, under ideal conditions, intersects thedigital representation 216B at the point corresponding to the point onor proximate to the object 206 as does the reference point 210A. Thus,given a reference point 210A (x) corresponding to a point in the digitalrepresentation 210A of the object 206 in the image 204, and a pointrepresenting the epipole 218A (e) (which may be derived using thefundamental matrix) corresponding to an intersection in the image 204Aof the line between the focal centers of each camera, the epipolar line210B (L) may be calculated according to the formula:L=e×x

FIG. 3 illustrates an example 300 of an embodiment of the presentdisclosure. Specifically, FIG. 3 depicts a process for quantifying costsof associating objects 306 in a first image 304A with objects 306 in asecond image 304B, where the objects 306 were captured by image capturedevices (e.g., the recording devices 112 of FIG. 1) from differentpositions. The costs are quantified based on distances 314A-14D of eachobject in one image (e.g., the first image 304A) to an epipolar line310A generate based on a reference point 310B of an object in the otherimage (e.g., the second image 304B).

The images 304A-04B may be similar to the images 204A-04B described inconjunction with FIG. 2.

For example, the objects 306 may be digital representations of animate,real-world objects and therefore their two-dimensional shapes ascaptured in digital representations (e.g., sets of pixels of the images304A-04B) may change between frames in a sequence of digital imagecaptures due to the real-world objects being in motion. Likewise,because the digital representations of the images 304A-04B are capturedfrom different perspectives, the two-dimensional shape of a particularobject's representation in the first image 304A may not match itscorresponding two-dimensional shape in the second image 304B (e.g., asilhouette of an object from the side may not match a silhouette of thesame object from the back unless the object is rotationally symmetric).Furthermore, in a situation where the objects 306A-06B share visualsimilarities (e.g., such as a sporting event where players of the sameteam are the same general shape and wear the same uniform and colorcombinations, or capturing images of a group of animals of the samespecies), matching objects by their shape and colors may not providesufficient confidence that a set of pixels in the first image 304A and aset of pixels in the second image 304B are representative of the sameobject. Techniques described in the present disclosure, however, providea mechanism for matching objects in one image with their correspondingobjects in another image.

In some implementations, objects being tracked may have a common featureallowing the objects 306 to be distinguished from other objects (e.g.,sports players distinguished from referees in the same field of view) orfrom the background (e.g., sports players from the sports ball in thesame field of view). For example, because an American football helmet isof a general round shape from most angles, a system of the presentdisclosure may identify the objects 306 of interest by identifying thehelmets in the image using one or more object detection techniques. Theobject detection techniques utilized may include one or more of edgedetection, corner detection, blob detection, or ridge detection.Examples of such techniques include Canny edge detector, Sobel operator,Harris & Stephens/Plessey/Shi-Tomasi corner detection algorithms, SUSANcorner detector, level curve curvature, features from acceleratedsegment test (FAST), Laplacian of Gaussian (LoG), difference ofGaussians (DoG), Monge-Ampére operator, maximally stable externalregions (MSER), principal curvature-based region detector (PCBR), andgrey-level blobs. However, since one helmet may be difficult todistinguish from other helmets of the same team, techniques of thepresent disclosure may be used to match an object in the first image304A with its corresponding object in the second image 304B even thoughthe images 304A-04B were captured from different perspectives andpositions.

As described above, the reference point in one image will correspond toan epipolar line in the other image. Likewise, within the shared fieldsof view of the images, each pixel (e.g., a center point of a helmet) inone image will correspond to an epipolar line in the other image.Ideally, for a reference point in the second image 304B, an epipolarline would be generated to directly pass through the equivalent point inthe first image 304A; for example, a line generated in a first image304A based on center point of a helmet in the second image 304B wouldideally pass through a center point of the corresponding helmet in thefirst image 304A. However, due to various factors (e.g., camera jitter,precision of object detection, precision of distance and anglerelationships between the cameras, etc.) the line generated for thefirst image 304A based on the reference point in the second image 304Bmay not always pass through the point in the first image 304A exactlyequivalent to the reference point.

Selecting the closest object to the epipolar line in the first image,however, runs the risk that the selected object may actually beassociated with a different object in the second image 304B. Thus, thepresent disclosure provides a technique for quantifying the cost of eachpotential assignment of an object in the first image to an object in thesecond image using an epipolar constraint and a homography constraint.For an epipolar constraint, a cost for each potential assignment may bedetermined, and then a homography constraint may be applied to make theassignment based upon the cost matrix.

The objects 306 may be sets of pixel values representing real-worldobjects in a scene that was captured as one or more digital images, suchas the images 304A-04B. As pictured in the example 300, the objects 306represent human figures in the scene. The reference point 310B may be apoint (e.g., pixel position) associated with an object (e.g., one of theobjects 306) in an image (e.g., image 304B). Relatedly, the epipolarline 310A may be an epipolar line generated in the other image (e.g.,image 304A) based on the reference point 310B. As can be seen in theillustrative example 300, the epipolar line 310A does not directly passthrough any of the objects 306 in the image 304A. Consequently, asdescribed below, a cost matrix will be generated based in part on thedistances 314A as part of the process for determining which of theobjects 306 in the image 304A correspond to the reference point 310B.

The distances 314A-14D may be distances measured from a reference pointof the respective objects 306 to the epipolar line 310A. In someembodiments, the distances 314A-14D are measured in number of pixelsbetween the reference point and the epipolar line 310A; however it iscontemplated that other units of measure may be used. The distances314A-14D may be the shortest distances between the reference point andthe epipolar line 310A (e.g., the length of the line from the referencepoint to a point of perpendicular (i.e., normal) intersection with theepipolar line 310A.

In an embodiment, a cost matrix is generated for each of the images304A-04B, where components of the cost matrices include, for eachobject, a distance (e.g., the distances 314A-14D) from a reference point(e.g., center point of a helmet) to the epipolar line 310A for each ofthe objects 306. For example, in a situation where the first image 304Ahas m objects and the second image 304B has n objects, the followingcost matrix may be created for the first image:

$\begin{bmatrix}{dA}_{11} & \ldots & {dA}_{1n} \\\vdots & \ddots & \vdots \\{dA}_{m\; 1} & \ldots & {dA}_{mn}\end{bmatrix}\quad$where dA₁₁ is the shortest distance of a 1^(st) object in the secondimage 304B to a 1^(st) line in the second image 304B generated based ona point of a 1^(st) object in the first image 304A, dA_(1n) is theshortest distance of an n^(th) object in the second image 304B to thefirst line in the second image 304B, dA_(m1) is the shortest distance ofa 1^(st) object in the second image 304B to an m^(th) line in the secondimage 304B generated based on a point of an m^(th) object in the firstimage 304A, and dA_(mn) is the shortest distance of an n^(th) object inthe second image 304B to the m^(th) line in the second image 304B (i.e.,d[istance])_([image A object][image B object])).

In a similar manner, a cost matrix may be created for the second image:

$\begin{bmatrix}{d\; B_{11}} & \ldots & {d\; B_{1m}} \\\vdots & \ddots & \vdots \\{d\; B_{n\; 1}} & \ldots & {d\; B_{n\; m}}\end{bmatrix}\quad$where dB₁₁ is the shortest distance of a 1^(st) object in the firstimage 304A to a 1^(st) line in the first image 304A generated based on apoint of a 1^(st) object in the second image 304B, dB_(1m) is theshortest distance of an m^(th) object in the first image 304B to thefirst line in the first image 304A, dB_(n1) is the shortest distance ofa 1^(st) object in the first image 304A to an n^(th) line in the firstimage 304A generated based on a point of an n^(th) object in the secondimage 304B, and dB_(nm) is the shortest distance of an m^(th) object inthe first image 304A to the n^(th) line in the first Image 304A (i.e.d[istance]_([image B object][image A object])).

The cost matrices (the cost matrix for the first image and the costmatrix for the second image) may be used to determine the cost ofassigning the particular object in one image to an object in the otherimage using a homography constraint, as explained in detail below. Notethat although techniques described in the present disclosure refer to apair of images having a shared portion of a scene (e.g., recordingdevices 112A-12B of FIG. 1), it is contemplated that techniques of thepresent disclosure may be applicable to any number of images and imagecapture devices. For example, FIG. 1 depicts an example embodiment withat least 14 recording devices with two or more sharing portions of thesame fields of view. More specifically, for an implementation wherethree cameras share a portion of the same view and simultaneously takeimage captures resulting in image A, image B, and image C, up to sixmatrices may be generated; i.e., two matrices for each of the threeimages. For an implementation where four cameras share a portion of thesame view, up to 12 matrices may be generated; i.e., three matrices foreach of the four images, and so on.

FIG. 4 illustrates an example 400 of an embodiment of the presentdisclosure. Specifically, FIG. 4 depicts a process for transforming animage during determining a homography constraint for the purpose ofassigning objects in one image to objects in another image. If thebackground surface 402 is flat (such as the sports field of the physicalenvironment 102 depicted in FIG. 1), the first and second images may bemodified, (e.g., stretched) based on information relating to how theimage capture devices (e.g., 212 of FIG. 2) are positioned relative toeach other in the physical world, in order to align the image with thebackground surface. For example, as shown in the example 400, the image404A, being taken from the perspective of the image capture device,renders the portion of a three-dimensional background surface in thephysical world as an approximate trapezoid in two-dimensions. The image404B, however, shows the background surface 402 as it has been stretchedto replicate the rectangular shape of the background surface in thephysical world. The objects 406, as can be seen, because they are notplanar, may become distorted in some manner in in the image 402B due tothe stretching. Moreover, depending on the positions and perspectives ofthe image capture devices, objects in each of the images captured by theimage capture devices may be distorted in differing amounts and indifferent directions.

The background surface 402 may be a representation of a surface in thephysical environment. The background surface 402 may be represented intwo dimensions in the images 404A-04B. The illustrative example 400depicts a background surface of a sports field. The background surface402 may be planar, although it is contemplated that techniques of thepresent disclosure may be applied to surfaces that are not necessarilyflat or uniform in color or texture. In the physical world, the realworld objects represented by the objects 406 may be proximate to thesurface and may interact with the surface.

The image 404A may be a digital image similar to the images 204A-04Bdescribed in conjunction with FIG. 2. As can be seen in the example 400,the first image 404A has been captured by an image capture device from aperspective above and to the side of the scene captured within the fieldof view of the image capture device. The second image 404B is a digitalimage produced by transforming the perspective of the first image 404Ato another perspective; specifically, the second image 404B has beentransformed into an overhead perspective.

The objects 406 may be numeric representations (e.g., sets of pixels) ofobjects in the scene captured in the images 404A-04B by image capturedevices, similar to the digital representations 216A-16B of FIG. 2. Ascan be seen, however, the objects 406 in the second image appear“stretched” as a result of the transformation performed on the firstimage 404A to produce the second image 404B. That is, as a result of thebackground surface 402 in the first image 404A being transformed from anasymmetric quadrilateral shape to a rectangle in the second image 404B,the objects 406 have been stretched vertically and skewed in a clockwisedirection. The more distant of the objects 406 may be transformed to agreater extent than the nearer object.

FIG. 5 illustrates an example 500 of an embodiment of the presentdisclosure. Specifically, FIG. 5 illustrates determining regions ofinterest 518A-18B based on the distortion of digital representations506A-06B of objects in respective images due to stretching therespective images to fit a background surface (as described inconjunction with FIG. 4, yielding the transformed images 504A-04B andthe transformed digital representations 516A-16B). The transformedimages 504A-04B may be transformations of digital images, such as theimages 304A-04B of FIG. 3, as a result of transforming the objects andbackground surface depicted in the images from the perspective of thecameras to fit the dimensions of the background surface (e.g., overheadview), such as in the manner described in conjunction with FIG. 4.

The digital representations 506A-06B may be sets of pixels representingthe same physical object in respective images that share a scene,similar to the digital representations 306 of FIG. 3. For example, thedigital representations 506A-06B may be sets of pixels representingfootball helmets in the scene captured by in the respective images. Forillustrative purposes, the digital representations 506A-06B aredisplayed as a simple circular shape to more clearly demonstrate how, asa result of modifying (e.g., stretching) portions of the originaldigital images to fit the dimensions of a physical surface in a scene(e.g., fit to an overhead view of the playing field), the digitalrepresentations 506A-06B may themselves be transformed into the modifieddigital representations 516A-16B. The digital representations 506A-06Bare drawn with a dashed line to indicate how the digital representations506A-06B would have appeared in the scene prior to the transformation oftheir respective images, and, in embodiments, would not actually appearin the transformed images 504A-04B.

The modified digital representations 516A-16B may be transformations ofthe digital representations 506A-06B as a result of the transformationperformed to yield the transformed images 504A-04B. For illustrativepurposes, the modified digital representations 516A-16B are displayed asa simple oval shape to more clearly demonstrate how the modified digitalrepresentations 516A-16B are transformed from the digitalrepresentations 506A-06B.

The regions of interest 518A-18B may be regions within which referencepoints of the respective objects represented by the digitalrepresentations 506A-06B may be found. That is, if the images aretransformed in the manner described for FIG. 4 to fit the geometry ofthe physical surface (e.g., physical surface 102), the regions ofinterest 518A-18C may be the most likely location of the reference pointin coordinates of the geometry of the physical surface.

The regions of interest 518A-18B (homography constraint) may be combinedwith the matrices produced from the epipolar constraint in a variety ofways. For example, each of the distances (e.g., the distances 314A-14Dof FIG. 3) of objects to the epipolar line that comprises the matrix ofthat image (as described above in conjunction with FIG. 3) may beweighted (e.g., by adding, multiplying, etc.) by the radius of theregion of interest that corresponds to that object. Thus, for example,if the region of interest 516A corresponds to the object in the firstimage 304A having the distance 314A, the distance 314A element of thematrix for the first image 304 may be weighted by the radius of theregion of interest 314A.

From there, the matrices for each of the images may be combined. Thecombination may be performed in a variety of ways. For example, one ofthe matrices produced from the epipolar constraint as described above inconjunction with FIG. 3 may be transposed and added to the other matrixto produce an m×n or n×m matrix. In this manner, the epipolar constraintand the homogrophy constraint may be combined. The objects in the imagemay be matched by determining which combination of matches results inthe lowest estimated cost if the matrices or combined matrix is providedas input to cost estimation algorithm (also referred to as thelowest-cost assignment). For example, a combinatorial optimizationalgorithm, such as the Hungarian algorithm, may be applied to theresultant matrix to determine which object-to-object assignmentcombination results in the lowest overall cost. In some examples, acombinatorial optimization algorithm is an algorithm constructed to findan optimum (e.g., shortest path) solution in a finite set of items. Thecombination with the lowest overall cost may be used as the finalobject-to-object mapping between the images. Note, however, that while agiven combinatorial optimization algorithm may not necessarily result ina correct or even an optimal assignment, the lowest-cost assignment mayprovide a suitable initial object-to-object assignment.

The association of an object in one image with an object in the otherimage may be stored in a data store for later use. Additionally oralternatively, this object-to-object association may be used todetermine or validate the assignment of trajectories described inconjunction with FIGS. 8-11. Additionally or alternatively, theobject-to-object association may be output to a user or caused to bedisplayed visually on a display device (e.g., computer screen).

Neither the epipolar constraint nor the homography constraintnecessarily relies on the image capture devices being calibrated ordetermining a zoom level of the image capture devices. Use of thetriangulation constraint, however, may be dependent upon havingcalibrated image capture devices. For the triangulation constraint, thethree-dimensional position of a point in physical space may bedetermined based on coordinates of the point in a first image capturedby a first image capture devices and coordinates of the point in asecond image captured by a second image capture device, and certainparameters (e.g., zoom level, etc.) of the image capture devices.

In embodiments, the triangulation constraint may be used to increase theaccuracy of matching objects in one image with objects in another. Forexample, a pair of objects may be determined to not match each other ifthe triangulation constraint indicates that a reference point on one ofthe objects object in one image matched with a reference point on theother object in another image would position the object below thephysical surface (e.g., if the a football player's calculated height isa negative value).

FIG. 6 is a flowchart/block diagram illustrating an example of a process600 for computing a pair of epipolar constraint cost matrices inaccordance with various embodiments. Some or all of the process 600 (orany other processes described, or variations and/or combinations ofthose processes) may be performed under the control of one or morecomputer systems configured with executable instructions and/or otherdata, and may be implemented as executable instructions executingcollectively on one or more processors. The executable instructionsand/or other data may be stored on a non-transitory computer-readablestorage medium (e.g., a computer program persistently stored onmagnetic, optical, or flash media).

For example, some or all of process 600 may be performed by any suitablesystem, such as a server in a data center, by various components of theenvironment 1300 described in conjunction with FIG. 13, such as the oneor more web servers 1306 or the one or more application servers 1308, bymultiple computing devices in a distributed system of a computingresource service provider, or by any electronic client device such asthe electronic client device 1302. The process 600 includes a series ofoperations wherein the system identifies objects in each of two objectsand, for each object in one image, computes a line in the other imageand determines the distances of the objects in that image to the line.Once all of the images have been cycled through, the system constructsmatrices using the determined distances.

In 602, the system identifies objects in each of a pair of images. Forexample, the system may be configured to detect sets of pixels thatmatch one or more of certain specified criteria, such as shape (e.g.,round, oval, square, rectangular, or some other specific shape), color(e.g., white, red, green with a yellow stripe, etc.), or othercharacteristic usable to identify a sets of pixels as corresponding torepresentations of the objects of interest.

As a specific example, the system may be configured to analyze theimages for occurrences of American football helmets in the images. Asnoted, the object may be detected using one or more of varioustechniques, such as Canny edge detection, Sobel operator, Harris &Stephens/Plessey/Shi-Tomasi corner detection algorithms, SUSAN cornerdetector, level curve curvature, FAST, LoG, DoG, MSER, or grey-levelblobs.

In 604A, the system performing the process 600 locates the first (ornext) object in the first image. The system may determine a referencepoint for the object in the first image, such as a pixel in the centerof the identified object. Likewise, in 604B, the system may locate afirst (or next) object in the second image. Similarly, the system maydetermine a reference point for the object in the second image.

In 606A, the system performing the process 600 calculates, based on thereference point determined in the first image in 604A and the determinedrelationship between the first image and the second image (see thedescription of FIG. 2), an epipolar line that crosses through the secondimage. Note that epipolar line may not necessarily be “drawn” into thesecond image itself, but may simply be a computation of a line in thecoordinate system represented by the second image. Similarly, in 608B,the system calculates, based on the reference point determined in thesecond image in 604B and the determined relationship between the firstimage and the second image, an epipolar line that crosses through thefirst image.

In 608A, the system performing the process 600 computes a distance froma first/next object in the second image to the epipolar line calculatedin 606A. The distance may be computed from a reference point (e.g.,center of the detected object, closest edge of the detected object,etc.) of the first/next object in the second image. Similarly, in 608B,the system computes a distance from the first/next object in the firstimage to the epipolar line calculated in 606B. The distance may becomputed from a reference point of the first/next object in the firstimage. In embodiments, the distances are measured in pixels. In someimplementations, the distances are straight-line distances; however, itis contemplated that, in other implementations, other units of distancemay be used (e.g., inches, centimeters, feet, etc.). It is furthercontemplated that, in some implementations, the distances may beManhattan distances, squared, or otherwise weighted.

In 610A, the system performing the process 600 determines whether thedistances for all of the objects in the second image to the epipolarline determined in 606A have been computed. If so, the system proceedsto 612A. Otherwise, the system returns to 608A to compute the distanceof the next object in the second image. Likewise, in 610B, the systemdetermines whether the distances for all of the objects in the firstimage to the epipolar line determined in 606B have been computed. If so,the system proceeds to 612B. Otherwise, the system returns to 608B tocompute the distance of the next object in the first image.

In 612A, the system determines whether, for each object in the firstimage, epipolar lines in the second image have been calculated. If so,the system proceeds to 614A. Otherwise, the system returns to 604A todetermine a reference point for the next object in the first image.Likewise, in 612B, the system determines whether, for each object in thesecond image, epipolar lines in the first image have been calculated. Ifso, the system proceeds to 614B. Otherwise, the system returns to 604Bto determine a reference point for the next object in the second image.

In 614A, the system performing the process 600 constructs a first costmatrix for each of the distances computed in the operations of 608A-12A.For example, rows of the matrix may represent the objects identified inthe first image, while columns of the matrix may represent objects inthe second image, or vice versa. Likewise, in 614B, the systemconstructs second cost matrix for each of the distances computed in theoperations of 608B-12B. Also similarly, rows of the matrix may representthe objects identified in the first image, while columns of the matrixmay represent objects in the second image, or vice versa. Note that itis contemplated that the matrix may be constructed dynamically duringthe operations of 604A-12A and 604 b-12B rather than in separateoperations.

In 616, the system performing the process 600 applies a homographyconstraint (see FIG. 7) to one or more of the elements of the costmatrices. For example, an element of the first cost matrix representinga distance from an object in the second image to the epipolar line mayfurther be weighted by the size of a region of interest around theobject after the object has been transformed in a manner described inFIGS. 4-5. Likewise, elements of the second cost matrix representingdistances from objects in the first image to epipolar lines may furtherbe weighted by the size of regions of interest around the transformedobjects in the first image.

Finally, in 618, the system performing the process 600 matches theobjects identified in the first image with the objects identified in thesecond image based on the first and second matrices. For example, thesystem may utilize the Hungarian algorithm or other combinatorialoptimization algorithm to determine a lowest-cost object-to-objectcombination. Note that one or more of the operations performed in 602-18may be performed in various orders and combinations, including inparallel. For example, although the operations of 604A-14A areillustrated and described above as occurring in parallel with 604B-14B,it is contemplated that these operations may be performed in series oras alternating operations. Also note that although the process 600 isdescribed in conjunction with only images, it is contemplated that thetechniques may be applied to multiple images sharing the same scene. Forexample, the operations 602 may identify objects in three images, inwhich case the operations of determining a reference point of first/nextobject in one image, determining a line in another image, determining adistance from a first/next object in the other image to the line, and soon may be repeated for each pairwise combination (six pairwisecombinations for three images: A-to-B, A-to-C, B-to-A, B-to-C, C-to-A,C-to-B).

FIG. 7 is a flowchart illustrating an example of a process 700 forapplying a homography constraint in accordance with various embodiments.Some or all of the process 700 (or any other processes described, orvariations and/or combinations of those processes) may be performedunder the control of one or more computer systems configured withexecutable instructions and/or other data, and may be implemented asexecutable instructions executing collectively on one or moreprocessors. The executable instructions and/or other data may be storedon a non-transitory computer-readable storage medium (e.g., a computerprogram persistently stored on magnetic, optical, or flash media).

For example, some or all of process 700 may be performed by any suitablesystem, such as a server in a data center, by various components of theenvironment 1300 described in conjunction with FIG. 13, such as the oneor more web servers 1306 or the one or more application servers 1308, bymultiple computing devices in a distributed system of a computingresource service provider, or by any electronic client device such asthe electronic client device 1302. The process 700 includes a series ofoperations wherein an image is transformed such that a set of pixels fita particular shape and a transformed object is identified. A region ofinterest is identified, based on the transformed object and an elementof a cost matrix is modified based on the region.

In 702, the system performing the process 700 may transform an image ofa scene, such as the image 404A of FIG. 4, in a manner such that anelement of the scene fits a particular geometry, such as in the mannerthat the image 404B has been stretched so as to fit the backgroundsurface 402 to the geometry of an overhead view of the actual physicalsurface. In 704, the system performing the process 700 identifies a(transformed) object within the transformed image that corresponds to anobject identified (e.g., in the operations of 602 of FIG. 6) in theuntransformed image. The transformed objects may be identified in asimilar manner as the untransformed objects, such as by using algorithmsfor edge detection, corner detection, blob detection, ridge detection,and so on.

In 706, the system performing the process 700 determines a region ofinterest associated with the transformed object. The region of interestmay be based on various factors, such as the size and/or shape of thetransformed object. For example, in some implementations, the region ofinterest may fit to the shape of the transformed object. In otherimplementations, the region of interest may be the area of a circle, orsome other shape, that surrounds the transformed object.

In 708, the system modifies the value of an element corresponding to theuntransformed object in a matrix (e.g., the matrix of 614A or 614B ofFIG. 6) of the untransformed image. The value may be modified based onthe size of the region of interest (e.g., radius (e.g., length inpixels), diameter, area (e.g. in pixels), etc.). The operations of 708may correspond to the operations 616; thus, the operations 702-08 may beperformed for each of the images being processed by the process 600.

In 710, the system utilizes the modified matrix to match the objects inone image to the objects in another image. For example, the modifiedmatrix for a first image may be combined (e.g., added) with a modifiedmatrix of another image, and then the Hungarian algorithm may be appliedto find a lowest cost assignment of objects in one image to objects inthe other image. The operations of 710 may correspond to the operationsof 618 of FIG. 6.

In some embodiments, the modified matrix may be further modified using atriangulation constraint. For example, if the positions, zoom levels,and relationships between the image capture devices that captured thescenes are known, a triangulation algorithm may be executed to, based oncoordinates of an object in a first image captured by a first imagecapture device and coordinates of an object in a second image capturedby a second image capture device, estimate a three-dimensional positionin physical space. From that estimated position, an element in thematrix corresponding to the object-to-object mapping may be weightedbased on an amount of deviation from an expected position in physicalspace. E.g., objects may be expected to be found at a height of 0 to 7feet in the air, and estimated object positions that deviate from thisrange may be penalized with an increased cost (matrix value).

Note that the operations 704-10 may be repeated for each of the objectswithin the image. Note too that the operations of 702-10 may be repeatedfor each of the images sharing a common scene for which matrices aregenerated in accordance with the process 600 of FIG. 6. Note furtherthat one or more of the operations performed in 702-20 may be performedin various orders and combinations, including in parallel.

FIG. 8 illustrates an example 800 of an embodiment of the presentdisclosure. Specifically, FIG. 8 depicts a sequence of images 802A-02Ccaptured over a period of time by an image capture device, such as oneof the recording devices 112A of FIG. 1. Within each of the sequence ofimages 802A-02C are a set of objects 806A-06D representing objects inmotion over the period of time. The example 800 further illustratesdetermining a trajectory 822 of a first object 806A based on a pointcloud 820. Whereas the FIGS. 3-7 illustrate techniques for matchingobjects between images captured in parallel by two or more image capturedevices, the techniques illustrated in FIGS. 8-9 are applicable tomatching objects in successive images captured by a single image capturedevice in order to determine a trajectory traversed by an object overtime. Furthermore, combining the techniques shown in FIGS. 3-9 allowstrajectories of objects in a sequence of images of a scene captured by afirst image device with trajectories of the same objects in a sequenceof images of the same scene captured by a second image capture devicesimultaneously with the first image capture device, as illustrated byFIG. 10.

In an embodiment, each of the sequence of images 802A-02C is an image,similar to the image 304A or the image 304B of FIG. 3, recorded by animage capture device. Each of the sequence of images 802A-02C may be animage frame of a plurality of image frames that comprise a videorecording of an event involving the objects 806A-06D.

In an embodiment, the objects 806A-06D are digital representation ofobjects in the scene recorded as the sequence of images 802A-02C by animage capture device. The object 806A-06D may be similar to the object306 of FIG. 6. Note that although only the first object 806A is shown ashaving a trajectory (e.g., the trajectory 822), it is contemplated thatthe techniques described may be applied to one or more of the otherobjects 806B-06D to compute their respective trajectories.

In an embodiment, the point clouds 820A-20C are sets of points generatedas a result of applying a prediction algorithm (e.g., one or moreparticle filters) to positions of the first object 806A. Each of the setof points in the point clouds 820A-20C may represent a prediction,according to a filter result, of a position of the first object 806A(e.g., a reference point on, in, or proximate to the first object 806A)in the next frame. In some implementations, the one or more particlefilters apply physical laws of motions to one or more sequentialmeasurements (e.g., previous positions of the first object 806A) toarrive at an estimate for a next position of the first object 806A. Insome examples, the one or more particle filters include a recursivefilter, such as a Kalman filter. Thus, a point in the point clouds820A-20C may be a result output by a particle filter based on thepresent position, and, in some implementations, a past position, of thefirst object 806A.

As illustrated in the first image 802A, the position of the first object806 may be input to a set of particle filters to produce the first pointcloud 820A. In the example 800, the first image 802A may be the firstimage in the sequence, and consequently previous position and/or pointcloud values for the object 806A may be unavailable. In such a case, theone or more particle filters may be seeded with one or more defaultvalues, such as a pseudo-random Gaussian distribution of values in thevicinity of (e.g., proximate to) the object 806A.

The object 806A may be identified in the second image 802B as being thesame as the object 806A in the first image 802A by virtue of the object806A in the second image 802B being in a position that lies within theregion of points predicted in the point cloud 820A or within a standarddeviation of one of the points predicted in the point cloud 820A. As canbe seen in the example 800, the objects 806B-06D are located outside theregion predicted by the point cloud 806A, and the second object 806B,although being the closest of the objects 806B-06D to the point cloud806A, is not as close as the first object 806A. Consequently, the systemof the present disclosure may determine that the first object 806A inthe second image 802B corresponds to the first object 806A in the firstimage 802A. The system of the present disclosure may then associate thepresent position of the first object 806A (in the second image 802B)with the previous position of the first object 806A (in the first image802A) in order to generate the trajectory 822 thus far.

The point cloud 820B may be generated by at least inputting the newposition of the first object 806A in the second image 802B. In someimplementations, the one or more particle filters may receive as input aprevious position of the first object in the first image 802A and/orwhich of the particular particle filters appeared to most accuratelypredict the position of the first object 806A in the second image 802A,into the one or more particle filters. In some implementations, thetypes and/or numbers of particle filters used may be modified based atleast in part on which, if any, of the previous particle filtersappeared to most accurately predict the position of the first object806A in the second image 802B. For example, the particle filters withpredictions whose predictions were the farthest from the positiondetermined to correspond to the first object 806A in the second image802B may be replaced by different particle filters in the generation ofthe point cloud 820B. On the other hand, if all of the predictions wereinaccurate (e.g., greater than a threshold distance or standarddeviation), additional particle filters may be utilized to generate thepoint cloud 802B.

In the third image 802C, the object 806A is seen to have moved to a nextposition. Again, as can be seen, the system determines, based on theproximity of the first object 806A in the third image 802C to thepredictions of the point cloud 820B, that the first object 806A in thethird image 806C is the same as the first object 806A from the secondimage 802B. Additionally or alternatively, the system may make itsdetermination based on a velocity (e.g., speed and/or direction) of thefirst object 806A, as determined by a previous change in position of thefirst object 806A from the point in time represented by the first image802A to the point in time represented by the second image 802B.

Consequently, the trajectory 822 may be updated to include the pathtraveled by the first object 806A from the point in time represented bythe second image 802B to the point in time represented by the thirdimage. The point cloud 820C may then be generated by at least inputtingthe new position of the first object 806A in the third image 802C.Likewise, in some implementations, the one or more particle filters mayreceive as input a previous position of the first object 806A (e.g., oneor both of the positions in the second image 802B or 802A) and/or whichof the particular particle filters appeared to most accurately predictthe position of the first object 806A in the first image 802C. In someimplementations, the particle filters may be again adjusted in themanner described above.

In some situations, due to any of a variety of factors (e.g., inclementweather, reduced image quality, obscured visibility, an object beingblocked by another object, etc.) the system of the present disclosuremay be unable to detect the object 806A within a particular image frame.In such cases, the particle filter inputs may include such data and thepoint cloud (e.g., the point cloud 820C may) expand to cover more areato account for additional distance the first object 806A may have movedbetween frames. Upon redetection of the first object 806A in asubsequent frame, the trajectory 822 may be extended to the position atwhich the first object 806A is located in the subsequent frame. In acase where the system cannot establish the identity of the first object806A as being the same as the first object 806 detected in a previousframe above a threshold certainty value, the system may begin a newtrajectory for the first object 806A. The new trajectory and the oldtrajectory may be stitched (e.g., the set of coordinates of the newtrajectory appended to the set of coordinates of the old trajectory)together from input from a user after a manual examination of the twotrajectories.

In an embodiment, the trajectory 822 represents a path taken by thefirst object 806A over the time period of the sequence of images802A-02C. The trajectory 822 may be stored as a series of points (i.e.,positions) or vectors. The points of the trajectory 822 may be locationsin physical space, three-dimensional virtual space, coordinates in atwo-dimensional space (e.g., aerial view of the scene represented by theimages (802A-02C), pixel coordinates in an image, or some other unit fordesignating the position of a point. A line connecting each point insequence may be considered to be the path traveled by the objectcorresponding to the trajectory 822, which, in example 800, would be thefirst object 806A. The process described above may continue for as manyimages as are in the sequence of images.

FIG. 9 is a flowchart illustrating an example of a process 900 forsimulating the impact of a command in accordance with variousembodiments. Some or all of the process 900 (or any other processesdescribed, or variations and/or combinations of those processes) may beperformed under the control of one or more computer systems configuredwith executable instructions and/or other data, and may be implementedas executable instructions executing collectively on one or moreprocessors. The executable instructions and/or other data may be storedon a non-transitory computer-readable storage medium (e.g., a computerprogram persistently stored on magnetic, optical, or flash media).

For example, some or all of process 900 may be performed by any suitablesystem, such as a server in a data center, by various components of theenvironment 1300 described in conjunction with FIG. 13, such as the oneor more web servers 1306 or the one or more application servers 1308, bymultiple computing devices in a distributed system of a computingresource service provider, or by any electronic client device such asthe electronic client device 1302. The process 900 includes a series ofoperations wherein an object is identified in each successive imageframe in a sequence of images and a trajectory of the object is trackedframe-by-frame.

In 902, the system performing the process 900 obtains a sequence ofimages. The sequence of images may be a series of image framescomprising a video capture event and may have been recorded/captured bya single image capture device (e.g., video camera). Within one or moreimages in the sequence of images may be a digital representation (e.g.,set of pixels) of an object within the scene of the image. The objectmay be similar to one of the objects 106 of FIG. 1. For example, theobject may be an animate object in motion that is captured in thesequence of images.

Proceeding from 902 to 904, system performing the process 900 locatesthe object whose trajectory is to be tracked within the first image ofthe sequence. It may be that the first image contains a plurality ofobjects within the image and, for each of the objects detected withinthe first image, the operations of 904-922 may be repeated in series orin parallel. The system may identify the object within the image usingany of a variety of object detection methods as described in the presentdisclosure.

In the case where the system returns to 904 from 912, the system maylocate the object in the next image in the sequence. This may beperformed by locating one or more objects in the image and determiningwhich of the one or more objects are within a set of predictedlocations, and if one or more objects are not within the set ofpredicted locations, which of the one or more objects is closest to theset of predicted locations; in a situation where multiple objects arewithin the set of predicted locations (or in the case of a tie to beingthe closest to the set of predicted locations) the system may identifywhich object to consider as the object based at least in part on variousfactors (e.g., average distance to each of the predicted locations inthe set, etc.). In some implementations, the system may only considerobjects that are found within a certain distance of the last identifiedlocation of the object. In some of these implementations, the system mayincrease the certain distance based on the number of frames since theobject was last identified (e.g., if the object has not been detectedfor a number of frames, expand the search area). Upon identifying theobject, the system determines the position of the object in a coordinatesystem. In some implementations, the position of the object in acoordinate system is the two-dimensional coordinates of a pixel positionin the image. In some implementations, the coordinate system may berepresented in real-world units (e.g., centimeters, feet, etc.) and maybe two-dimensional (X-Y) or three-dimensional (X-Y-Z).

For the first image in the sequence, the operations of 906 may beomitted. However, in 906, the system determines whether it hasidentified the object in the image being examined. It may be that, dueto various factors, the system was unable to identify the object withina degree of certainty (e.g., a view of the object in the image isobscured, multiple objects within the vicinity of the predictions havinga probability above a threshold (e.g., above 25%) of corresponding tothe object, so out of caution, no object is selected). If the object isidentified, the system proceeds to 918.

Otherwise, the system proceeds to 908, where the system performing theprocess 900 determines whether the set of predicted locations exist. Ifnot (e.g., in the case where the image is the first image in thesequence), the system may initialize the set of predicted locations. Asnoted in the present disclosure, the set of predicted locations may beresults returned from executing a set of particle filters (also referredto as a “prediction model”). The set of particle filters may be anensemble of Kalman filters that applies physical laws of motions tomultiple sequential measurements (e.g., previous positions of theobject) to return an estimate of a future position of the object. Thus,if the set of predicted locations by the set of particle filters exist,the system may proceed to 914. Otherwise, the system may proceed to 910to initialize the set of predictions.

In 910, the system initializes the set of particle filters (also knownas the prediction model). In embodiments, the set of particle filtersmay be initialized by seeding the set of particle filters with thecurrent position of the object and a Gaussian distribution of positionsin the vicinity of the current position of the object. Using the set ofparticle filters, in 912, the system performing the process 900generates a set of predicted locations for the object in the next image.Thereafter, the system returns to 904 to use the set of predictedlocations to identify the object in the next image.

Proceeding to 916, if the object determined to have been found in 906,the system performing the process 900 adds the current position of theobject to a set of past object positions that comprise a trajectory ofthe object. The trajectory may be a series of coordinates that define apath taken by the object over the time period during which the sequenceof images were captured.

In some implementations, further after determining in 906 that theobject was found, the system may determine which of the set of predictedlocations was the best fit to the actual object position. The system mayadjust the prediction model based on the predicted location that wasfound to be the most accurate (e.g., closest to the actual position),and in this manner, may self-tune itself to make more accuratepredictions over time.

In 918, the system performing the process 900 updates the predictionmodel. In some embodiments, updating the prediction model includesadding the current position of the object as input to the set ofparticle filters. In some implementations, updating the prediction modelincludes adding or removing particle filters from the set of particlefilters. Furthermore, in some implementations, updating the predictionmodel may include expanding the area of the point cloud (i.e.,distribution of the set of particle filters), such as in the case wherethe system was unable to determine that the object was found within thecurrent image in 906.

In 920, the system performing the process 900 determines whether it hasreached the last image in the sequence of images. If not, the systemproceeds to 912 to generate a new set of predicted locations for theobject in the next image based at least in part on the updatedprediction model of 918. Otherwise, the system proceeds to 922,whereupon the system outputs the trajectory. In some cases, the systemmay output the trajectory to a storage destination, such as a datastore, for later use or analysis. In other cases, the system may outputthe trajectory to be processed further, such as by matching trajectoriesin sequences of images captured by multiple image capture devices, suchas in the manner described in FIG. 10.

Note that one or more of the operations performed in 902-22 may beperformed in various orders and combinations, including in parallel.Furthermore, the operations in 904-22 may be repeated in parallel or inseries for each of multiple objects that the system performing theprocess 900 is able to detect within the images.

FIG. 10 illustrates an example 1000 of an embodiment of the presentdisclosure. Specifically, FIG. 10 depicts a mapping trajectories1022A-22B of objects 1006A-06B in a first sequence of images 1002Acaptured by a first image capture device to trajectories 1022C-22D ofobjects 1006C-06D in a second sequence of images 1002A captured by asecond image capture device, and, in turn to trajectories 1022E-22G ofobjects 1006E-06G in a third sequence of images 1002C captured by athird image capture device. Note that the techniques described inconnection with FIG. 10 may be applied to sequences of images capturedby two image capture device up to any number of image capture devices.

In an embodiment, each of the sequences of images 1002A-02C are similarto the sequence of images 802A-02C of FIG. 8, the first sequence ofimages 1002A was captured by a first camera, the second sequence ofimages 1002B was captured by a second camera, and the third sequence ofimages 1002C was captured by a third camera. The sequences of images1002A-02C may be recordings of the same scene due to overlap in field ofviews of the respective recording devices, similar to the situationsdepicted in FIGS. 1-3. As a result, the sequences of images 1002A-02Cmay include digital representations of one or more objects in the samescene common to some or all of the image capture devices.

In an embodiment, the objects 1006A-06G are digital representations(e.g., sets of pixels) of physical objects recorded within the images ofthe sequences of images 1002A-02C by three image capture devices. In anembodiment, the trajectories 1022A-22G are sets of points (e.g.,coordinates) that represent the path of motion followed by the physicalobjects represented by the objects 1006A-06G during the event capturedin the sequences of images 1002A-02C.

In the example 1000, three video cameras have simultaneously capturedsequences of images. In the example 1000, it is determined from themulti-camera homogenous object alignment process of FIGS. 6-7 thatobjects 1006A-06B (A and B) in the first sequence of images 1002Acorrespond to objects 1006D and 1006D (Y and X respectively) in thesecond sequence of images 1002B and objects 1006E-06F (M and Nrespectively) in the third sequence of images 1002C. Consequently, thesystem of the present disclosure can determine that the trajectories1022A, 1022D, and 1022F, as determined as a result of the process ofFIG. 9 correspond to the same trajectory, but from differentperspectives. Similarly the system may determine that the trajectories1022B, 1022C, and 1022E also correspond to another same trajectory, butfrom different perspectives.

Matrices for the trajectories of the objects 1006A-06F may be expressedin the following manner for a first set of image frames (e.g.,frame_(∝), to frame_(∝+5)) of each of the sequences of images 1002A-02C:[A Y M][B X N]

However, as can be seen in the example 1000, something happens with thethird sequence of images 1002C such that the system loses track of theobject 1006F (M). However, through continued processing of the sequencesof images via the processes of FIGS. 6-7 and 9, the system determines atrajectory 1022G for a different object 1006G and further determinesthat the objects 1006A and 1006C correspond with the different object1006G. This further provides evidence to the system that the trajectory1022G corresponds to parts of the trajectories 1022A and 1022C becausethey spatially and temporally align in the manner described above.Consequently, matrices for the trajectories of the object 1006-06E and1006G may be expressed in the following manner for a second set of imageframes (e.g., frame_(∝+10)+10 to frame_(∝+15)) of each of the sequencesof images 1002A-02C:[A Y O][B X N]

Thus, a first physical object represented by the objects 1006A, 1006D,and 1006F-06G over the sequences of images 1002A-02C may be representedas the set of trajectories [A][Y][M][O]. Likewise, a second physicalobject represented by the objects 1006B-06C, and 1006E over thesequences of images 1002A-02C may be represented as thetrajectories[B][X][N]. In some implementations, the trajectories may bestitched together, such that the matrix for the first physical objectbecomes:[A Y M∪O]

In some cases, a trajectory may begin within the sequence of images. Forexample, an object that initially began outside the field of view of theimage capture device may move into the field of view of the imagecapture device. In such a case, the trajectory may begin at the pointthat the object is first detected. In a similar manner, a trajectory mayend before the end of the sequence of images, such as if an object thathad been within the field of view of the image capture device movesoutside the field of view of the image capture device. In cases wheremultiple image capture devices capture the same scene, one image capturedevice may have the object within its field of view while the otherimage capture device lacks the object from its field of view. In suchcases, a trajectory in the sequence of images of the first image capturedevice may have no corresponding trajectory within the second imagecapture device.

Conversely, trajectories may be split. Splitting a trajectory may beused to rectify a situation where multiple objects cross paths and thesystem of the present disclosure inadvertently attributes motion of atleast one object to the incorrect trajectory. This situation isillustrated in FIG. 11.

FIG. 11 illustrates an example 1100 of a misattributed trajectory of anembodiment of the present disclosure. Specifically, FIG. 11 depictssequences of images 1102A-02B of an event recorded by image capturedevices. In the actual event, the paths of the objects 1106A-06B fromthe view of a first image capture device cross, as can be seenillustrated by the paths 1122A-22B. However, in some cases the systemperforming the process 900 of FIG. 9 may initially misattribute at leasta part of the trajectory of the first object 1106A to the second object1106B and vice versa. This misattribution may be caused by the objects1106A-06B being in such close proximity to each other for at least oneframe that one of the objects 1106A-06B obscures the other or isotherwise within the point cloud of the other object, and getsmistakenly attributed to the other object's trajectory thereby. Anexample of a stage where such a misattribution could occur isillustrated by the magnifying glass.

In an embodiment, the first sequence of images 1102A is a set of imageframes captured by a first image capture device, such as the sequence ofimages 1002A of FIG. 10. Likewise the second sequence of images 1102B isa set of image frames capture by a second image capture device, such asthe sequence of images 1002B. In an embodiment, the objects 1106A-06Dare digital representations (e.g., sets of pixels) of objects in thephysical world recorded to the sequences of images 1102A-02B during theevent captured by the image capture devices. In an embodiment, theactual trajectories 1122A-22D are the paths of motion actually taken bythe respective objects 1106A-06D, such that each point of thetrajectories 1122A-22D correspond to a position occupied by theirrespective objects 1106A-06D.

The mis-assigned trajectories 1122E-22F show how the system performingthe process 900 of FIG. 9 has inadvertently mis-assigned portions of thetrajectories 1122A-22B. For example, a portion of the actual trajectory1122A of the first object 1106B has been misattributed to the secondobject 1106A such that the trajectory 1122F begins at the beginninglocation of the first object 1106A and ends at ending location of thesecond object 1106B. Likewise, a portion of the actual trajectory 1122Bof the second object 1106B has been assigned to the first object 1106Asuch that the trajectory 1122F begins at the beginning location of thesecond object 1106B and ends at the ending location of the first object1122A.

The cost 1118 reflects the effect that a mis-assigned trajectory has ona cost value of associating a point in the trajectory of an object inone camera with the analogous point in the mis-assigned trajectory overtime (e.g., deeper into the sequence of image frames). It can be seen inthe cost graph 1124 that the cost 1118 appears to indicate that themis-assignment of the trajectories 1122E-22F began somewhere betweenframe 10 and frame 15. The particular cost threshold 1128 may bespecified value above which the trajectory associated with the cost 1118is determined to be mis-assigned. The image frame at which thetrajectory is determined to be mis-assigned may be the frame at whichthe mis-assigned trajectory needs to be split.

The first object 1106A and the first part of the mis-assigned trajectory1122F are initially correctly attributed. However, by applying costmatrices to the corresponding points between the mis-assigned trajectory1122E and the trajectory 1122D, it can be seen in the cost graph 1124that the cost 1128 of associating the first object 1106A with thetrajectory 1122F rises sharply as the trajectory 1122E diverges. Thus,in some implementations, if the cost 118 is observed by the system ofthe present disclosure to exceed a particular cost threshold 1128, thesystem determines that a trajectory mis-assignment has occurred.

In the example 900, it is illustrated that the first object 1106Arepresents the same object as the fourth object 1106D, but viewed from adifferent perspective. Likewise, it is illustrated that the secondobject 1106B is a representation of the same object represented by thethird object 1106C, but also viewed from a different perspective. Asillustrated in 1100, on an initial pass, the system determines that themis-assigned trajectory 1122E corresponds to trajectory 1122D and themis-assigned trajectory 1122F corresponds to the trajectory 1122C.

To check the accuracy of the determination, the system applies ahomography constraint to the trajectories. For example, for an imageframe in the first sequence of images 1102A, the system computes anepipolar line in the corresponding image frame in the second sequence ofimages 1102B, with the epipolar line being based on the point in thetrajectory 1122E that corresponds to that image frame in the firstsequence of images 1102A. The system calculates a first distance (e.g.,in pixels) of the point in the trajectory 1122D that corresponds to thatimage frame in the second sequence of images 1102B to the epipolar line.

Likewise, for an image frame in the second sequence of images 1102B, thesystem computes an epipolar line in the corresponding image frame in thefirst sequence of images 1102A, with the epipolar line being based onthe point in the trajectory 1122D that corresponds to that image framein the second sequence of images 1102B. The system calculates a seconddistance of the point in the trajectory 1122E that corresponds to thatimage frame in the first sequence of images 1102A to the epipolar line.In some implementations, the cost is a sum of the first distance and thesecond distance.

In some implementations, the system also measures the distance of pointsin other objects' trajectories (e.g., the trajectories 1122F and 1122C)to the epipolar lines in order to generate a pair of cost matrices asdescribed above in conjunction with FIG. 3. In such implementations, anoptimization algorithm, such as the Hungarian algorithm, may be appliedto determine a lowest cost assignment of trajectory points. This processmay be repeated for at least a portion of the images in the sequences ofimages 1102A-02B. If the lowest cost assignment of trajectory pointsdiverges at all—or diverges by a threshold amount—the system may splitthe mis-assigned trajectories and stitch (append) the trajectoriesaccording to the lowest cost assignment. Starting from the image framecorresponding to this split, the system may repeat the process 900 forthe sequence of image frames (e.g., the sequence of image frames 1102A)to regenerate the trajectories from this point. In this manner, theportions of the mis-assigned trajectories 1122C-22D are split off andre-attached to the correct trajectories such that they match the actualtrajectories 1122A-22B.

In some implementations, the system of the present disclosure may betrained to recognize the causes of erroneous trajectories and performmitigation operations. For example, the system may be trained toidentify characteristics of occluded visibility, lens flares,reflections, or ghost images and may discard such anomalies whenidentifying objects within or proximate to the point cloud. As anotherexample, in a case where corresponding trajectories in both a firstsequence of images captured by a first image capture device and in asecond sequence of images capture device are mis-aligned, the systemmay, by determining which of the trajectories to split and stitch forthe first sequence of images, make a more accurate determination ofwhich of the trajectories in the second sequence of images to split andstitch. In some implementations, one or more of various machine learningtechniques may be implemented to improve accuracy in trajectorydetermination, such as supervised learning techniques, unsupervisedlearning techniques, semi-supervised learning techniques, transductionor transductive inference techniques, reinforcement learning,developmental learning, and the like.

Note that, in the context of describing disclosed embodiments, unlessotherwise specified, use of expressions regarding executableinstructions (also referred to as code, applications, agents, etc.)performing operations that “instructions” do not ordinarily performunaided (e.g., transmission of data, calculations, etc.) denote that theinstructions are being executed by a machine, thereby causing themachine to perform the specified operations.

FIG. 12 is a flowchart illustrating an example of a process 1200 forvalidating trajectories in accordance with various embodiments. Some orall of the process 1200 (or any other processes described, or variationsand/or combinations of those processes) may be performed under thecontrol of one or more computer systems configured with executableinstructions and/or other data, and may be implemented as executableinstructions executing collectively on one or more processors. Theexecutable instructions and/or other data may be stored on anon-transitory computer-readable storage medium (e.g., a computerprogram persistently stored on magnetic, optical, or flash media).

For example, some or all of process 1200 may be performed by anysuitable system, such as a server in a data center, by variouscomponents of the environment 1300 described in conjunction with FIG.13, such as the one or more web servers 1306 or the one or moreapplication servers 1308, by multiple computing devices in a distributedsystem of a computing resource service provider, or by any electronicclient device such as the electronic client device 1302. The process1200 includes a series of operations wherein trajectories are determinedfor identified objects in sets of images captured by at image capturedevices, costs of assigning a trajectory in one set of images to atrajectory in another of the set of images are calculated, and if thecost indicates that a trajectory has been mis-assigned split thetrajectory and rejoin the trajectory to a trajectory that is a betterfit. These operations may be repeated until the trajectories achieve asteady state, at which point the trajectories are validated and may beoutput.

In 1202, the system performing the process 1200 obtains a first set ofimages captured by a first image capture device and a second set ofimages captured by a second image capture device. The sets of images maybe sequences of images (e.g., digital video) of the same scene but fromdifferent positions and perspectives. Note that the techniques aredescribed in terms of two sets of images, but it is contemplated thatthe process 1200 may be extended to more than two sets of images thatshare representations of the same object.

In 1204, the system performing the process 1200 determines for each ofthe sets of images, a trajectory of an object within the scene in themanner described in the process 900 of FIG. 9. That is, the systemdetermines a trajectory for a representation of an object in the firstset of images that is determined to correspond to a trajectory for arepresentation of the same object in the second set of images.

In 1206, the system performing the process 1200 obtains a first imagethat is associated with a point in a first trajectory from the first setof images and a corresponding second image that is associated with apoint in a second trajectory from the second set of images, where thefirst trajectory and the second trajectory were, at least initially,determined to correspond to the same object (e.g., by using an epipolarand/or homography constraint as descried in conjunction with FIGS. 2-7).

In 1208, the system performing the process 1200 calculates the cost toassociate the first trajectory with the second trajectory. The cost maybe calculated in a variety of ways. For example, in an implementationthe cost is the sum of the distance of the point in the first trajectoryto an epipolar line in the first image that corresponds to point in thesecond trajectory in the second image and the distance of the point inthe second trajectory to an epipolar line in the second image thatcorresponds to the point in the first trajectory. Further details onthese calculations may be found in the description of FIGS. 2-3. Inanother implementation, the system builds cost matrices based epipolarlines and points in all of the trajectories in the first and secondimages and a combinatorial optimization algorithm is used to determinelowest-cost assignment.

In 1210, the system performing the process 1200 determines whether thecost calculated in 1208 indicates that the trajectories at the pointsrepresented in the first image and the second image are unlikely to becorrectly matched. For example, in an implementation if the calculatedcost exceeds a threshold value, the system determines that thetrajectories are not, or are no longer, correctly matched. As anotherexample, in another implementation if, based on results from acombinatorial optimization algorithm, the lowest-cost assignment doesnot match the first trajectory with the second trajectory, the systemdetermines that the trajectories are not, or are no longer, correctlymatched. As a result of determining that the first trajectory is anincorrect match to the second trajectory, the system proceeds to 1212.Otherwise, the system may proceed to 1218.

In 1212, the system performing the process 1200, having determined thatthe first trajectory is an incorrect match to the second trajectory,splits the trajectories. That is, the system may conclude the portionsof the first and second trajectories assigned to a first object at thepoints represented by the previous image frames in the first and secondsets of images. However, it may not be clear at this stage which ofeither the first trajectory or the second trajectory is mis-assigned.Thus, in some implementations the system may perform operations similarto the epipolar and/or homography constraints described conjunctionswith in FIGS. 2-7; that is, the system may generate cost matrices usingdistances to epipolar lines from points in the trajectories of the firstand second images and a combinatorial optimization algorithm todetermine whether either the first trajectory or second trajectoryshould be assigned, at this point in the trajectory, to a differentobject.

In 1214, the system performing the process 1200 determines, whether thecost assignment performed in 1212 indicates that the first trajectoryand/or second trajectory at the point associated with the first imageand the second image should be associated with a different object. Ifnot, the system may proceed to 1218. Otherwise, the system proceeds to1216.

In 1216, the system performing the process 1200, having determined thatthe first trajectory and/or second trajectory should be associated witha different object, appends (stitches) the remaining portion of themis-assigned trajectory to the trajectory of the different object. Insome cases, if the different object is already associated with its owntrajectory, the system may append/stitch the remaining portions of thattrajectory to the object previously associated with the mis-assignedtrajectory.

In 1218, the system performing the process 1200 determines whether theend of the first and second sets of images has been reached. If allimages in the first and second sets of images have not been processed,the system may return to 1206 to obtain the next images in the sets.Otherwise, if all of the images have been processed, the system proceedsto 1220.

In 1220, the system performing the process 1200 determines whether alltrajectories computed in the process 1200 have achieved a steady state.In some examples, a “steady state” is achieved when no furthertrajectories are determined to be mis-assigned after a specified numberof iterations (e.g., one, iteration, two iterations, three iterations,etc.) of the operations 1204-18 (e.g., no further trajectory splits areperformed). If, however, one or more trajectories were determined to bemis-assigned during the most recent performance of the operations of1204-18, the system may return to 1204 to repeat the trajectoryvalidation process. Otherwise, the system proceeds to 1222 to output thetrajectories that have been validated as properly assigned. Note thatone or more of the operations performed in 1202-22 may be performed invarious orders and combinations, including in parallel.

Once trajectories have been determined and objects have been identified,the system of the present disclosure may be able to locate the positionof any object in the sequences of images at any specified time or imageframe. For example, through an interface a user may submit a request tothe system of the present disclosure for the state (e.g., coordinatesand/or velocity) of a particular object in the 10^(th) frame of videosof the same scene recorded by three cameras. Additionally oralternatively, the system of the present disclosure may cause a displayscreen (e.g., by sending signals or image data to the display screen) tohighlight or otherwise mark the object in a depiction of the image frameon the screen. In this manner, a user may view and track an object'smotion in any or all camera views of the scene(s) containing the objectimage. The object's motion, trajectory, and or position may be displayedin an image, multiple images side-by-side, or as moving visual mediaplaying the sequences of images at any of a variety of speeds (e.g.,fast, slow, step, etc.). In some examples, an “interface” refers tocomputer hardware or software designed to communicate informationbetween hardware devices, between software programs, between devices andprograms, or between a device and a user.

The techniques described herein may be used to analyze the motion andimpact of objects before a crash, explosions, trajectories of shrapnel,ballistics and the manner in which motion and velocity of variouscompositions of matter is affected by impact and ricochet, and otherkinematics studies. Note, that techniques of the present disclosure maybe applied to distinguish objects and track trajectories of variousobjects in motion that are captured within a sequence of image frames,whether the objects are moving at high speeds (e.g., automobiles,projectiles, objects in space, etc.), at low speeds (e.g., bacteria,plant growth, etc.), or at speeds in-between (e.g., sports players, herdanimals, migratory birds, etc.).

FIG. 13 illustrates aspects of an example environment 1300 forimplementing aspects in accordance with various embodiments. As will beappreciated, although a web-based environment is used for purposes ofexplanation, different environments may be used, as appropriate, toimplement various embodiments. The environment includes an electronicclient device 1302, which can include any appropriate device operable tosend and/or receive requests, messages, or information over anappropriate network 1304 and convey information back to a user of thedevice. Examples of such client devices include personal computers, cellphones, handheld messaging devices, laptop computers, tablet computers,set-top boxes, personal data assistants, embedded computer systems,electronic book readers, and the like.

The environment 1300 in one embodiment is a distributed and/or virtualcomputing environment utilizing several computer systems and componentsthat are interconnected via communication links, using one or morecomputer networks or direct connections. However, it will be appreciatedby those of ordinary skill in the art that such a system could operateequally well in a system having fewer or a greater number of componentsthan those illustrated in FIG. 13. Thus, the depiction in FIG. 13 shouldbe taken as being illustrative in nature and not limiting to the scopeof the disclosure.

The network 1304 can include any appropriate network, including anintranet, the Internet, a cellular network, a local area network, asatellite network or any other network, and/or combination thereof.Components used for such a system can depend at least in part upon thetype of network and/or environment selected. Many protocols andcomponents for communicating via such network 1304 are well known andwill not be discussed in detail. Communication over the network 1304 canbe enabled by wired or wireless connections and combinations thereof. Inan embodiment, the network 1304 includes the Internet and/or otherpublicly-addressable communications network, as the environment 1300includes one or more web servers 1306 for receiving requests and servingcontent in response thereto, although for other networks an alternativedevice serving a similar purpose could be used as would be apparent toone of ordinary skill in the art.

The illustrative environment 1300 includes one or more applicationservers 1308 and data storage 1310. It should be understood that therecan be several application servers, layers or other elements, processesor components, which may be chained or otherwise configured and caninteract to perform tasks such as obtaining data from an appropriatedata store. Servers, as used, may be implemented in various ways, suchas hardware devices or virtual computer systems. In some contexts,“servers” may refer to a programming module being executed on a computersystem. As used, unless otherwise stated or clear from context, the term“data store” or “data storage” refers to any device or combination ofdevices capable of storing, accessing, and retrieving data, which mayinclude any combination and number of data servers, databases, datastorage devices, and data storage media, in any standard, distributed,virtual, or clustered environment.

The one or more application servers 1308 can include any appropriatehardware, software, and firmware for integrating with the data storage1310 as needed to execute aspects of one or more applications for theelectronic client device 1302, handling some or all of the data accessand business logic for an application. The one or more applicationservers 1308 may provide access control services in cooperation with thedata storage 1310 and is able to generate content including text,graphics, audio, video, and/or other content usable to be provided tothe user, which may be served to the user by the one or more web servers1306 in the form of HyperText Markup Language (HTML), Extensible MarkupLanguage (XML), JavaScript, Cascading Style Sheets (CS S), JavaScriptObject Notation (JSON), and/or another appropriate client-sidestructured language. Content transferred to the electronic client device1302 may be processed by the electronic client device 1302 to providethe content in one or more forms, including forms that are perceptibleto the user audibly, visually, and/or through other senses. The handlingof all requests and responses, as well as the delivery of contentbetween the electronic client device 1302 and the one or moreapplication servers 1308, can be handled by the one or more web servers1306 using Hypertext Preprocessor (PHP), Python, Ruby, Perl, Java, HTML,XML, JSON, and/or another appropriate server-side structured language inthis example. Further, operations described as being performed by asingle device may, unless otherwise clear from context, be performedcollectively by multiple devices, which may form a distributed and/orvirtual system.

Each server typically will include an operating system that providesexecutable program instructions for the general administration andoperation of that server and typically will include a computer-readablestorage medium (e.g., a hard disk, random access memory, read onlymemory, etc.) storing instructions that, when executed (i.e., as aresult of being executed) by a processor of the server, allow the serverto perform its intended functions.

The data storage 1310 can include several separate data tables,databases, data documents, dynamic data storage schemes, and/or otherdata storage mechanisms and media for storing data relating to aparticular aspect of the present disclosure. For example, the datastorage 1310 may include mechanisms for storing various types of dataand user information 1316, which can be used to serve content to theelectronic client device 1302. The data storage 1310 also is shown toinclude a mechanism for storing log data, such as application logs,system logs, access logs, and/or various other event logs, which can beused for reporting, analysis, or other purposes. It should be understoodthat there can be many other aspects that may need to be stored in thedata storage 1310, such as page image information and access rightsinformation, which can be stored in any of the above listed mechanismsas appropriate or in additional mechanisms in the data storage 1310. Thedata storage 1310 is operable, through logic associated therewith, toreceive instructions from the one or more application servers 1308 andobtain, update, or otherwise process data in response thereto. The oneor more application servers 1308 may provide static, dynamic, or acombination of static and dynamic data in response to the receivedinstructions. Dynamic data, such as data used in web logs (blogs),shopping applications, news services, and other applications may begenerated by server-side structured languages as described or may beprovided by a content management system (CMS) operating on, or under thecontrol of, the one or more application servers 1308.

In one embodiment, a user, through a device operated by the user, cansubmit a search request for a match to a particular search term. In thisembodiment, the data storage 1310 might access the user information toverify the identity of the user and obtain information about items ofthat type. The information then can be returned to the user, such as ina results listing on a web page that the user is able to view via abrowser on the electronic client device 1302. Information related to theparticular search term can be viewed in a dedicated page or window ofthe browser. It should be noted, however, that embodiments of thepresent disclosure are not necessarily limited to the context of webpages, but may be more generally applicable to processing requests ingeneral, where the requests are not necessarily requests for content.

The various embodiments further can be implemented in a wide variety ofoperating environments, which in some embodiments can include one ormore user computers, computing devices, or processing devices that canbe used to operate a number of applications. User or client devices caninclude any number of computers, such as desktop, laptop, or tabletcomputers running a standard operating system, as well as cellular,wireless, and handheld devices running mobile software and capable ofsupporting a number of networking and messaging protocols. Such a systemalso can include a number of workstations running any of a variety ofcommercially available operating systems and other known applicationsfor purposes such as development and database management. These devicesalso can include other electronic devices, such as dummy terminals,thin-clients, gaming systems, and other devices capable of communicatingvia the network 1304. These devices also can include virtual devicessuch as virtual machines, hypervisors, and other virtual devices capableof communicating via the network 1304.

Various embodiments of the present disclosure utilize the network 1304that would be familiar to those skilled in the art for supportingcommunications using any of a variety of commercially availableprotocols, such as Transmission Control Protocol/Internet Protocol(TCP/IP), User Datagram Protocol (UDP), protocols operating in variouslayers of the Open System Interconnection (OSI) model, File TransferProtocol (FTP), Universal Plug and Play (UpnP), Network File System(NFS), and Common Internet File System (CIFS). The network 1304 can be,for example, a local area network, a wide-area network, a virtualprivate network, the Internet, an intranet, an extranet, a publicswitched telephone network, an infrared network, a wireless network, asatellite network, and any combination thereof. In some embodiments,connection-oriented protocols may be used to communicate between networkendpoints. Connection-oriented protocols (sometimes calledconnection-based protocols) are capable of transmitting data in anordered stream. Connection-oriented protocols can be reliable orunreliable. For example, the TCP protocol is a reliableconnection-oriented protocol. Asynchronous Transfer Mode (ATM) and FrameRelay are unreliable connection-oriented protocols. Connection-orientedprotocols are in contrast to packet-oriented protocols such as UDP thattransmit packets without a guaranteed ordering.

In embodiments utilizing the one or more web servers 1306, the one ormore web servers 1306 can run any of a variety of server or mid-tierapplications, including Hypertext Transfer Protocol (HTTP) servers, FTPservers, Common Gateway Interface (CGI) servers, data servers, Javaservers, Apache servers, and business application servers. The server(s)also may be capable of executing programs or scripts in response torequests from user devices, such as by executing one or more webapplications that may be implemented as one or more scripts or programswritten in any programming language, such as Java®, C, C# or C++, or anyscripting language, such as Ruby, PHP, Perl, Python, or TCL, as well ascombinations thereof. The server(s) may also include database servers,including those commercially available from Oracle®, Microsoft®,Sybase®, and IBM® as well as open-source servers such as MySQL,Postgres, SQLite, MongoDB, and any other server capable of storing,retrieving, and accessing structured or unstructured data. Databaseservers may include table-based servers, document-based servers,unstructured servers, relational servers, non-relational servers, orcombinations of these and/or other database servers.

The environment 1300 can include a variety of data stores and othermemory and storage media as discussed above. These can reside in avariety of locations, such as on a storage medium local to (and/orresident in) one or more of the computers or remote from any or all ofthe computers across the network 1304. In a particular set ofembodiments, the information may reside in a storage-area network (SAN)familiar to those skilled in the art. Similarly, any necessary files forperforming the functions attributed to the computers, servers, or othernetwork devices may be stored locally and/or remotely, as appropriate.Where a system includes computerized devices, each such device caninclude hardware elements that may be electrically coupled via a bus,the elements including, for example, a central processing unit (CPU orprocessor), an input device (e.g., a mouse, keyboard, controller, touchscreen, or keypad), and an output device (e.g., a display device,printer, or speaker). Such a system may also include one or more storagedevices, such as disk drives, optical storage devices, and solid-statestorage devices such as random access memory (RAM) or read-only memory(ROM), as well as removable media devices, memory cards, flash cards,etc.

Such devices also can include a computer-readable storage media reader,a communications device (e.g., a modem, a network card (wireless orwired), an infrared communication device, etc.), and working memory asdescribed above. The computer-readable storage media reader can beconnected with, or configured to receive, a computer-readable storagemedium representing remote, local, fixed, and/or removable storagedevices as well as storage media for temporarily and/or more permanentlycontaining, storing, transmitting, and retrieving computer-readableinformation. The system and various devices will typically also includea number of software applications, modules, services, or other elementslocated within a working memory device, including an operating systemand application programs, such as a client application or web browser.In addition, customized hardware might also be used and/or particularelements might be implemented in hardware, software (including portablesoftware, such as applets), or both. Further, connection to othercomputing devices such as network input/output devices may be employed.

Storage media and computer readable media for containing code, orportions of code, can include any appropriate media known or used in theart, including storage media and communication media, such as volatileand non-volatile, removable and non-removable media implemented in anymethod, or technology for storage and/or transmission of informationsuch as computer readable instructions, data structures, programmodules, or other data, including RAM, ROM, Electrically ErasableProgrammable Read-Only Memory (EEPROM), flash memory or other memorytechnology, Compact Disc Read-Only Memory (CD-ROM), digital versatiledisk (DVD), or other optical storage, magnetic cassettes, magnetic tape,magnetic disk storage, or other magnetic storage devices, or any othermedium which can be used to store the desired information and can beaccessed by the system device. Based on the disclosure and teachingsprovided, a person of ordinary skill in the art will appreciate otherways and/or methods to implement the various embodiments.

The specification and drawings are, accordingly, to be regarded in anillustrative rather than a restrictive sense. However, it will beevident that various modifications and changes may be made thereuntowithout departing from the broader spirit and scope of the invention asset forth in the claims. Other variations are within the spirit of thepresent disclosure. Thus, while the disclosed techniques are susceptibleto various modifications and alternative constructions, certainillustrated embodiments thereof are shown in the drawings and have beendescribed above in detail. It should be understood, however, that thereis no intention to limit the invention to the specific form or formsdisclosed, but on the contrary, the intention is to cover allmodifications, alternative constructions, and equivalents falling withinthe spirit and scope of the invention, as defined in the appendedclaims.

The use of the terms “a,” “an,” “the,” and similar referents in thecontext of describing the disclosed embodiments (especially in thecontext of the following claims) are to be construed to cover both thesingular and the plural, unless otherwise indicated or clearlycontradicted by context. The terms “comprising,” “having,” “including,”and “containing” are to be construed as open-ended terms (i.e., meaning“including, but not limited to,”) unless otherwise noted. The term“connected,” where unmodified and referring to physical connections, isto be construed as partly or wholly contained within, attached to, orjoined together, even if there is something intervening. Recitation ofranges of values are merely intended to serve as a shorthand method ofreferring individually to each separate value falling within the range,unless otherwise indicated and each separate value is incorporated intothe specification as if it were individually recited. The use of theterm “set” (e.g., “a set of items”) or “subset,” unless otherwise notedor contradicted by context, is to be construed as a nonempty collectioncomprising one or more members. Further, unless otherwise noted orcontradicted by context, the term “subset” of a corresponding set doesnot necessarily denote a proper subset of the corresponding set, but thesubset and the corresponding set may be equal.

Conjunctive language, such as phrases of the form “at least one of A, B,and C,” or “at least one of A, B and C,” is understood with the contextas used in general to present that an item, term, etc. may be either Aor B or C, or any nonempty subset of the set of A and B and C, unlessspecifically stated otherwise or otherwise clearly contradicted bycontext. For instance, in the illustrative example of a set having threemembers, the conjunctive phrases “at least one of A, B, and C” and “atleast one of A, B and C” refer to any of the following sets: {A}, {B},{C}, {A, B}, {A, C}, {B, C}, {A, B, C}. Thus, such conjunctive languageis not generally intended to imply that certain embodiments require atleast one of A, at least one of B and at least one of C each to bepresent. In addition, unless otherwise noted or contradicted by context,the term “plurality” indicates a state of being plural (e.g., “aplurality of items” indicates multiple items). The number of items in aplurality is at least two, but can be more when so indicated eitherexplicitly or by context.

Operations of processes described can be performed in any suitable orderunless otherwise indicated or otherwise clearly contradicted by context.Processes described (or variations and/or combinations thereof) may beperformed under the control of one or more computer systems configuredwith executable instructions and may be implemented as code (e.g.,executable instructions, one or more computer programs, or one or moreapplications) executing collectively on one or more processors, byhardware, or combinations thereof. The code may be stored on acomputer-readable storage medium, for example, in the form of a computerprogram comprising instructions executable by one or more processors.The computer-readable storage medium may be non-transitory. In someembodiments, the code is stored on a set of one or more non-transitorycomputer-readable storage media having stored thereon executableinstructions that, when executed (i.e., as a result of being executed)by one or more processors of a computer system, cause the computersystem to perform operations described herein. The set of non-transitorycomputer-readable storage media may comprise multiple non-transitorycomputer-readable storage media and one or more of individualnon-transitory storage media of the multiple non-transitorycomputer-readable storage media may lack all of the code while themultiple non-transitory computer-readable storage media collectivelystore all of the code. Further, in some embodiments, the executableinstructions are executed such that different instructions are executedby different processors. As an illustrative example, a non-transitorycomputer-readable storage medium may store instructions. A main CPU mayexecute some of the instructions and a graphics processor unit mayexecute other of the instructions. Generally, different components of acomputer system may have separate processors and different processorsmay execute different subsets of the instructions.

Accordingly, in some embodiments, computer systems are configured toimplement one or more services that singly or collectively performoperations of processes described herein. Such computer systems may, forinstance, be configured with applicable hardware and/or software thatenable the performance of the operations. Further, computer systems thatimplement various embodiments of the present disclosure may in someembodiments be single devices and in other embodiments be distributedcomputer systems comprising multiple devices that operate differentlysuch that the distributed computer system performs the operationsdescribed and such that a single device may not perform all operations.

The use of any examples, or exemplary language (e.g., “such as”)provided is intended merely to better illuminate embodiments of theinvention and does not pose a limitation on the scope of the inventionunless otherwise claimed. No language in the specification should beconstrued as indicating any non-claimed element as essential to thepractice of the invention.

Embodiments of this disclosure are described, including the best modeknown to the inventors for carrying out the invention. Variations ofthose embodiments may become apparent to those of ordinary skill in theart upon reading the foregoing description. The inventors expect skilledartisans to employ such variations as appropriate and the inventorsintend for embodiments of the present disclosure to be practicedotherwise than as specifically described. Accordingly, the scope of thepresent disclosure includes all modifications and equivalents of thesubject matter recited in the claims appended hereto as permitted byapplicable law. Moreover, although above-described elements may bedescribed in the context of certain embodiments of the specification,unless stated otherwise or otherwise clear from context, these elementsare not mutually exclusive to only those embodiments in which they aredescribed; any combination of the above-described elements in allpossible variations thereof is encompassed by the scope of the presentdisclosure unless otherwise indicated or otherwise clearly contradictedby context.

All references, including publications, patent applications, and patentscited are hereby incorporated by reference to the same extent as if eachreference were individually and specifically indicated to beincorporated by reference and were set forth in its entirety.

What is claimed is:
 1. A computer-implemented method, comprising:obtaining, via a plurality of image capture devices sharing differentperspective views of a common scene, a first image and a second image;determining a first point corresponding to an object identified in thefirst image; determining, based at least in part on the first point andrelative positions of the plurality of image capture devices, a line inthe second image corresponding to the first point in the first image;determining, in the second image, a second point corresponding to afirst set of pixels and a third point corresponding to a second set ofpixels; determining a first distance from the second point to the lineand a second distance from the third point to the line; calculating,based at least in part on the first distance and the second distance, aset of cost values; determining, based at least in part on the set ofcost values, that the first set of pixels represents, in the secondimage, the object identified in the first image; and associating thefirst set of pixels with the object.
 2. The computer-implemented methodof claim 1, wherein determining that the first set of pixels representsthe object includes applying a combinatorial optimization algorithm tothe set of cost values.
 3. The computer-implemented method of claim 1,wherein: the method further comprises: transforming, based at least inpart on the different perspective views, the first set of pixels and thesecond set of pixels into transformed sets of pixels; and modifying,based at least in part on the transformed sets of pixels, the set ofcost values to produce modified cost values; and determining that thefirst set of pixels represents the object in the second image isperformed based at least in part on the modified cost values.
 4. Thecomputer-implemented method of claim 1, further comprising; calculating,based at least in part on positions of the plurality of image capturedevices in a physical space, an expected position of the object in thephysical space; and modifying the set of cost values based at least inpart on the expected position.
 5. A system, comprising: one or moreprocessors; and memory including executable instructions that, ifexecuted by the one or more processors, cause the system to: identify,based at least in part on specified criteria, a first digitalrepresentation in an image and a second digital representation in asecond image; determine, based at least in part on a position of thefirst digital representation in the image, a first epipolar line in thesecond image; determine, based at least in part on a position of thesecond digital representation in the second image, a second epipolarline, the second epipolar line being in the image; determine at leastone cost value based at least in part on: the first digitalrepresentation; the second digital representation; the first epipolarline; and the second epipolar line; determine, based at least in part onthe at least one cost value, that the first digital representation andthe second digital representation represent a same object; andassociate, in a data store, the first digital representation with thesecond digital representation.
 6. The system of claim 5, wherein: theexecutable instructions further include instructions that cause thesystem to determine transformations of the first digital representationand the second digital representation as a result of transforming: afirst region of pixels within the image to fit a particular shape; and asecond region of pixels in the second image to fit the particular shape;and the executable instructions that cause the system to determine theat least one cost value further cause the system to determine the atleast one cost value based at least in part on sizes of thetransformations.
 7. The system of claim 5, wherein the specifiedcriteria includes a characteristic usable to identify the first digitalrepresentation in the image, the characteristic being a size of a set ofpixels, a shape of the set of pixels, or a color of the set of pixels.8. The system of claim 5, wherein the first digital representation isone of a plurality of digital representations in the image, wherein eachof the plurality of digital representations matches the specifiedcriteria.
 9. The system of claim 8, wherein: a plurality of cost valuesis associated with the plurality of digital representations, theplurality of cost values including the at least one cost value; and theexecutable instructions that cause the system to determine that thefirst digital representation and the second digital representationrepresent the same object include instructions that cause the system toperform an optimization algorithm to the plurality of cost values todetermine that the first digital representation and the second digitalrepresentation represent the same object.
 10. The system of claim 9,wherein the optimization algorithm is the Hungarian algorithm.
 11. Thesystem of claim 5, wherein the executable instructions that cause thesystem to determine the at least one cost value include includeinstructions that cause the system to: determine a first pixel distancefrom the first digital representation to the first epipolar line;determine a second pixel distance from the second digital representationto the second epipolar line; and determine the at least one cost valuebased at least in part on the first pixel distance and the second pixeldistance.
 12. The system of claim 5, wherein: the image depicts a scenefrom a first perspective; and the second image depicts the scene from asecond perspective different from the first perspective.
 13. Anon-transitory computer-readable storage medium having stored thereonexecutable instructions that, if executed by one or more processors of acomputer system, cause the computer system to at least: obtain a firstimage and a second image; determine a first point corresponding to anobject in the first image; determine, based at least in part on thefirst point, a line in the second image; determine a second pointcorresponding to a set of pixels in the second image; calculate, basedat least in part on a distance from the second point to the line, a costof associating the set of pixels in the second image with the object inthe first image; and determine, based at least in part on the cost, thatthe set of pixels represents the object.
 14. The non-transitorycomputer-readable storage medium of claim 13, wherein the executableinstructions further include instructions that cause the computer systemto: perform a transformation to the second image that produces a secondset of pixels that are associated with the set of pixels; and modify thecost based at least in part on a size of the second set of pixels. 15.The non-transitory computer-readable storage medium of claim 13,wherein: the cost is a first cost; the executable instructions furtherinclude instructions that cause the computer system to: determine, basedat least in part on the second point, a second line in the first image;and calculate, based at least in part on a distance from the first pointto the second line, a second cost of associating the object in the firstimage with the set of pixels in the second image; and the executableinstructions that cause the computer system to determine that the set ofpixels represents the object further include instructions that cause thecomputer system to determine, based at least in part on the first costand the second cost, that the set of pixels represents the object. 16.The non-transitory computer-readable storage medium of claim 13, whereinthe line is an epipolar line.
 17. The non-transitory computer-readablestorage medium of claim 13, wherein: the first image is an image of ascene captured by a first image capture device from a first perspective;and the second image is another image of the scene captured by a secondimage capture device from a second perspective.
 18. The non-transitorycomputer-readable storage medium of claim 17, wherein: the executableinstructions further cause the computer system to determine acorrespondence between the first image capture device and the secondimage capture device based at least in part on a static featurerepresented in the first image and second image; and the executableinstructions that cause the computer system to determine the lineinclude instructions that cause the computer system to determine theline based at least in part on the correspondence.
 19. Thenon-transitory computer-readable storage medium of claim 13, wherein:the set of pixels is a first set of pixels; the cost is a first cost;the executable instructions further include instructions that cause thecomputer system to: determine a third point corresponding to a secondset of pixels in the second image; and calculate, based at least in parton a second distance from the third point to the line, a second cost ofassociating the second set of pixels with the object; and the executableinstructions that cause the computer system determine that the set ofpixels represent the object include instructions that cause the computersystem to determine, based at least in part on the first cost beinglower than the second cost, that the first set of pixels represents theobject.
 20. The non-transitory computer-readable storage medium of claim13, wherein: the first image is a member of a first sequence of images;the object is associated with a first trajectory within the firstsequence of images; the set of pixels is associated with a secondtrajectory in a second sequence of images that includes the secondimage; and the executable instructions further include instructions thatcause the computer system to, as a result of the set of pixels beingdetermined to represent the object, associate the first trajectory withthe second trajectory.